security: fix critical RCE, IDOR, SSRF, webhook, and rate-limit issues by madhavcodez · Pull Request #2 · madhavcodez/agentary

madhavcodez · 2026-05-18T03:59:39Z

Summary

PR 2 of 4 in the critical-cleanup sweep. Addresses every CRITICAL and the seven highest-severity HIGH findings from the parallel security audit.

Pairs with the CI gates landed in #1.

CRITICAL fixes

1. Python executor RCE (`services/crews/tools/python_executor.py`)

The previous blocklist was a substring search on the raw code string. Bypassable in seconds:

m = __import__('o' + 's')              # split string defeats `import os` check
importlib.import_module('subprocess')  # importlib wasn't in the blocklist
open('/etc/passwd').read()             # open was never blocked
().__class__.__bases__[0].__subclasses__()  # dunder traversal to subprocess.Popen

Fix: Replaced with RestrictedPython.compile_restricted — AST-level validation that rejects __import__, exec, eval, dunder traversal, and attribute writes at compile time. Globals are limited to safe_builtins + limited_builtins + utility_builtins plus a name-bound stdlib allowlist (math, statistics, collections, itertools, etc.). Stdout captured by PrintCollector. Thread-based wall-clock timeout.

2. Twilio webhook signature validation (`api/voice_webhooks.py`)

All three endpoints (voice-status, voice-recording, voice-transcription) had zero signature validation. Anyone with the URL could:

Inject arbitrary transcripts for any guessable call_sid
Forge call statuses (mark completed, failed)
Trigger voice_service.process_completed_call against arbitrary records

Fix: New core/webhook_security.verify_twilio_signature — HMAC-SHA1 over the sorted form params + the public URL, signed with TWILIO_AUTH_TOKEN. Uses TWILIO_WEBHOOK_BASE_URL to reconstruct the URL Twilio actually signed when behind a reverse proxy. All three endpoints Depends() on the verifier, so requests are 403'd before any DB access.

3. Resend webhook HMAC (`api/webhooks.py`)

RESEND_WEBHOOK_SECRET was configured but never imported by the handler. Anyone could POST forged bounce/complaint events to suppress legitimate emails or fabricate EmailEvent rows.

Fix: Svix-style HMAC-SHA256 over {webhook_id}.{timestamp}.{body} with 5-minute replay tolerance. Handles the whsec_ prefix and both svix-* / webhook-* header casings.

4. `init_db()` silent failure (`database.py`)

except Exception:
    pass  # Tables already exist from migrations

Swallowed all exceptions including wrong-DSN/permission/network errors — boot succeeded with a dead DB.

Fix: Replaced with an explicit SELECT 1 connectivity check that raises RuntimeError on failure. Base.metadata.create_all() removed — Alembic is the single source of truth for schema, which closes the silent migration-drift bug.

HIGH fixes

5. SSRF on outbound HTTP (`core/url_guard.py`, both `web_scraper.py`)

Both scrapers used httpx.AsyncClient(follow_redirects=True) with no IP filter. Prompt-injectable into hitting http://169.254.169.254/... (AWS IMDS) or any internal service.

Fix: New url_guard module with assert_safe_url + safe_http_get:

Rejects non-http/https schemes (file://, gopher://, etc.)
Resolves all DNS records; rejects if any is loopback / link-local / RFC-1918 / multicast / reserved
safe_http_get follows redirects manually with per-hop revalidation — closes the "benign hostname 302s to IMDS" bypass

6–8. IDOR fixes (`api/findings.py`, `api/voice.py`, `api/missions.py`)

All three returned data filtered only by their resource ID, never by user_id. An authenticated attacker could enumerate other users' findings, voice extractions, call transcripts, and mission runs by guessing UUIDs.

Fix: Each list/get/create now joins through Project (or Mission for runs) on user_id. voice.py factored the ownership check into a shared _owned_voice_extraction helper. findings.py and voice.py also gained limit/offset pagination.

9. Auth brute-force (`api/auth.py`)

/auth/login and /auth/register had zero rate limiting despite slowapi being installed.

Fix: @limiter.limit("5/minute") on login, @limiter.limit("3/minute") on register.

10. LLM-controlled phone numbers (`core/phone_guard.py`, `voice_caller.py`)

The LLM could supply any string to voice_caller's phone_number parameter — including premium-rate (+1-900), international, and emergency lines. Real cost / TCPA exposure.

Fix: New phone_guard.validate_outbound_number:

Parses via phonenumbers
Region must be in US/CA allowlist (configurable)
Number type must be FIXED_LINE / MOBILE / TOLL_FREE / VOIP
Explicit denylist: 1-900, 1-976, 1-809, 9xx emergency short codes
Failure returns a structured tool-error so the LLM can self-correct

11. Voice server CORS (`voice/server.py`)

allow_origins=["*"] with a WebSocket / future-credentialed surface = cross-site hijacking from any origin.

Fix: Parsed settings.allowed_origins allowlist, narrowed methods/headers.

Tests

backend/tests/test_security_guards.py — unit coverage for each new primitive:

7 python_executor tests confirming the historic bypasses now fail
6 url_guard tests for loopback, IMDS, RFC-1918, IPv6 loopback, file://, missing host
5 phone_guard tests for 1-900, 911, +44, empty input, valid US mobile
3 webhook_security tests for Twilio reference match, Resend valid signature, Resend replay rejection

New dependencies

RestrictedPython>=7.4 — the sandbox itself
twilio>=9.0.0 — used elsewhere already; pinned here for the SDK
phonenumbers>=8.13.0 — E.164 parsing

What this does NOT fix (deferred to PR 3)

These are reliability/perf, not security:

APScheduler in-process multi-fire under --workers >1 → PR 3
Missing DB indexes → PR 3
Sync db.query in async routes → PR 3
asyncio.ensure_future (deprecated) → PR 3

Test plan

CI green on this branch
test_security_guards.py passes
Manual: hit /webhooks/twilio/voice-status with no signature → expect 403
Manual: hit /webhooks/resend with a stale svix-timestamp → expect 403
Manual: invoke python_executor with __import__('o'+'s') → expect status: error
Manual: voice_caller with phone_number: "+19005551234" → expect rejection

References

Security review CRITICAL chore: bootstrap operational hygiene (CI, hooks, templates, configs) #1-4, HIGH chore(deps): bump actions/setup-node from 4 to 6 #5-11
Architecture review G5 (init_db silent fail)

Addresses the four CRITICAL and seven HIGH security findings from the parallel security audit. Each fix is scoped and individually testable. CRITICAL — Python executor sandbox (RCE) - services/crews/tools/python_executor.py: replace the substring blocklist (bypassable by ``__import__('o'+'s')``, ``importlib.import_module``, ``open()``) with RestrictedPython AST-level compilation, curated safe_builtins + limited_builtins + utility_builtins, name-bound stdlib allowlist (math/statistics/etc.), PrintCollector for stdout capture, thread-based wall-clock timeout CRITICAL — Webhook signature verification - core/webhook_security.py: new module - verify_twilio_signature: HMAC-SHA1 over sorted form params + public URL against TWILIO_AUTH_TOKEN; uses TWILIO_WEBHOOK_BASE_URL to reconstruct the URL Twilio signed when behind a proxy - verify_resend_signature: Svix-style HMAC-SHA256 over {id}.{timestamp}.{body} with 5-minute replay window; handles ``whsec_`` prefix; accepts both ``svix-*`` and ``webhook-*`` headers - api/voice_webhooks.py: all three Twilio endpoints now depend on verify_twilio_signature; pre-403 before any DB access - api/webhooks.py: reads raw body once, validates Resend signature, then parses JSON; previously RESEND_WEBHOOK_SECRET was configured but unused CRITICAL — Silent DB-failure boot - database.py: replace ``except Exception: pass`` in init_db() with a loud connectivity check (``SELECT 1``); raise RuntimeError on failure. Alembic is the authority for schema; create_all() removed entirely to prevent migration drift from being papered over at boot HIGH — SSRF on outbound HTTP from agent tools - core/url_guard.py: assert_safe_url + safe_http_get - rejects file:// / non-http schemes - rejects RFC-1918, loopback, link-local (169.254/16 — cloud IMDS), multicast, reserved, unspecified - resolves all DNS records (defends against TOCTOU and multi-record attacks) - safe_http_get follows redirects manually with per-hop revalidation; closes the "benign hostname redirects to IMDS" bypass - services/crews/tools/web_scraper.py: uses safe_http_get with follow_redirects=False - services/data_sources/connectors/web_scraper.py: same; UnsafeURLError caught alongside httpx errors in the unified except chain HIGH — IDOR fixes - api/findings.py: list_findings now joins Project on user_id; create_finding verifies project ownership; cursor pagination added (limit/offset) - api/voice.py: ``_owned_voice_extraction`` helper centralises the Project-join ownership check; list/get/calls and create all route through it - api/missions.py: list_mission_runs now verifies Mission.user_id == user.id before returning runs HIGH — Auth brute-force protection - api/auth.py: ``@limiter.limit("5/minute")`` on login, ``@limiter.limit("3/minute")`` on register — previously unlimited HIGH — LLM-controlled phone numbers - core/phone_guard.py: validate_outbound_number - parses + validates E.164 via phonenumbers - rejects regions outside US/CA allowlist - rejects PREMIUM_RATE / VOICEMAIL / PAGER / SHARED_COST number types - explicit denylist for 1-900/1-976/1-809 prefixes and 9xx emergency short codes - services/crews/tools/voice_caller.py: validates phone_number before any Twilio path; failure returns a structured tool-error so the LLM can self-correct rather than retry HIGH — Voice server CORS - voice/server.py: ``allow_origins=["*"]`` replaced with parsed ``settings.allowed_origins`` allowlist; methods/headers narrowed Tests - backend/tests/test_security_guards.py: unit tests for each new module - Confirms the historic bypasses are now blocked (__import__ split, importlib, open, exec, dunder traversal) - SSRF guard rejects loopback, IMDS, RFC-1918, IPv6 loopback, file:// - Phone guard rejects 1-900, 911, +44, empty - Twilio signature helper matches reference implementation - Resend signature verifies on valid payload; rejects 1-hour-old replay Dependencies - RestrictedPython >= 7.4 - twilio >= 9.0.0 (only for shared TwiML helpers; signature done locally) - phonenumbers >= 8.13.0

…ecycle (#3) Addresses the highest-leverage findings from the parallel performance and reliability review. Each change is independently revertible. Lifespan refactor (CRITICAL) - main.py: replace the warn-and-continue init pattern with a SubsystemReadiness tracker. Required subsystems (database) fail-fast and abort startup; optional subsystems (expert seeding, source_registry, scheduler, redis_bridge) degrade gracefully and expose their state via the new /health/ready endpoint - Replace ``next(get_session())`` with ``with SessionLocal() as db:`` so the session is properly cleaned up on every path - Remove the ``stop_scheduler = lambda: None`` rebind smell; the default is defined up-front and overwritten only on successful start - redis_task tracked by name and awaited on shutdown so cancellation is observable Health surface - api/health.py: new GET /health/ready exposes per-subsystem readiness from the lifespan handler. Operators can detect degraded boots without digging through logs Tool-result caching (CRITICAL — wired existing infra) - core/tool_cache.py: new fail-open Redis cache keyed on (tool, params) with deterministic JSON hashing, lazy connection, per-call TTL, and ``cached=True`` flag on hits - services/crews/tools/exa_search.py: wraps the real fetch in tool_cache.cached with 1h TTL. Previously every mission re-paid Exa for the same queries - services/crews/tools/gemini_search.py: same treatment for Gemini grounding calls Celery worker lifecycle (HIGH) - celery_app.py: ``worker_max_tasks_per_child=10`` recycles each worker process after 10 tasks to release accumulated LLM-context heap that workers don't free between tasks. A research-heavy worker grew several GB RSS over a day without this - ``task_soft_time_limit=300`` + ``task_time_limit=360`` so a stuck HTTP call (Gemini latency spike, Twilio webhook timeout) cannot hang a queue indefinitely Database indexes (HIGH) - alembic/versions/f4a8d2c1e9b0_add_perf_indexes.py: adds - ix_missions_status (backs ?status=running dashboard filter) - ix_missions_user_created (backs /missions home-page query) - ix_findings_mission_confidence (backs per-mission finding list with confidence-ordered display) Background task lifecycle (HIGH) - core/background_tasks.py: ``spawn_background_task`` keeps a strong reference in a module-level set + done-callback that logs uncaught exceptions. ``asyncio.ensure_future`` (deprecated 3.10+) without reference retention is GC-collectable; tasks would silently disappear - api/missions.py, api/projects.py: three ``asyncio.ensure_future`` call sites migrated to ``spawn_background_task`` Error-envelope sanitisation (HIGH) - api/data_sources.py:183,266 + api/workflows.py:262: stop leaking raw exception strings in 5xx responses. ``logger.exception(...)`` captures the full traceback server-side; the user gets a generic message + a hint to check logs by correlation id Pagination on unbounded list endpoint (HIGH) - api/run_steps.py: ``limit``/``offset`` parameters (default 200, max 1000) so a long-running mission's run trace doesn't return a multi-MB blob Deferred to a follow-up PR - Wrapping the broader async-route sync ``db.query`` calls in ``run_in_threadpool`` (or migrating to asyncpg) needs touching ~33 router modules and warrants its own PR with a focused test plan - Moving APScheduler jobs to Celery Beat (CRITICAL #2 in perf review) requires reconciling autopilot/monitor/workflow job loaders — also deferred - ReactFlow lazy-load + ActivityFeed memoization land with the broader dashboard polish PR Co-authored-by: Madhav Chauhan <j_s_chauhan@hotmail.com>

This was referenced May 18, 2026

perf+reliability: lifespan fail-fast, tool caching, indexes, task lifecycle #3

Merged

chore: prune dead code, remove planning archive, relocate Dockerfiles #4

Merged

madhavcodez merged commit 4a6b20f into main May 18, 2026

madhavcodez deleted the cleanup/phase-1-security branch May 18, 2026 04:31

madhavcodez mentioned this pull request May 18, 2026

refactor: unify Exa + Gemini SDK access behind platform/infrastructure/providers #15

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: fix critical RCE, IDOR, SSRF, webhook, and rate-limit issues#2

security: fix critical RCE, IDOR, SSRF, webhook, and rate-limit issues#2
madhavcodez merged 1 commit into
mainfrom
cleanup/phase-1-security

madhavcodez commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

madhavcodez commented May 18, 2026

Summary

CRITICAL fixes

1. Python executor RCE (services/crews/tools/python_executor.py)

2. Twilio webhook signature validation (api/voice_webhooks.py)

3. Resend webhook HMAC (api/webhooks.py)

4. init_db() silent failure (database.py)

HIGH fixes

5. SSRF on outbound HTTP (core/url_guard.py, both web_scraper.py)

6–8. IDOR fixes (api/findings.py, api/voice.py, api/missions.py)

9. Auth brute-force (api/auth.py)

10. LLM-controlled phone numbers (core/phone_guard.py, voice_caller.py)

11. Voice server CORS (voice/server.py)

Tests

New dependencies

What this does NOT fix (deferred to PR 3)

Test plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Python executor RCE (`services/crews/tools/python_executor.py`)

2. Twilio webhook signature validation (`api/voice_webhooks.py`)

3. Resend webhook HMAC (`api/webhooks.py`)

4. `init_db()` silent failure (`database.py`)

5. SSRF on outbound HTTP (`core/url_guard.py`, both `web_scraper.py`)

6–8. IDOR fixes (`api/findings.py`, `api/voice.py`, `api/missions.py`)

9. Auth brute-force (`api/auth.py`)

10. LLM-controlled phone numbers (`core/phone_guard.py`, `voice_caller.py`)

11. Voice server CORS (`voice/server.py`)